Face Anti-spoofing (FAS) is essential to secure face recognition systems from various physical attacks. However, recent research generally focuses on short-distance applications (i.e., phone unlocking) while lacking consideration of long-distance scenes (i.e., surveillance security checks). In order to promote relevant research and fill this gap in the community, we collect a large-scale Surveillance High-Fidelity Mask (SuHiFiMask) dataset captured under 40 surveillance scenes, which has 101 subjects from different age groups with 232 3D attacks (high-fidelity masks), 200 2D attacks (posters, portraits, and screens), and 2 adversarial attacks. In this scene, low image resolution and noise interference are new challenges faced in surveillance FAS. Together with the SuHiFiMask dataset, we propose a Contrastive Quality-Invariance Learning (CQIL) network to alleviate the performance degradation caused by image quality from three aspects: (1) An Image Quality Variable module (IQV) is introduced to recover image information associated with discrimination by combining the super-resolution network. (2) Using generated sample pairs to simulate quality variance distributions to help contrastive learning strategies obtain robust feature representation under quality variation. (3) A Separate Quality Network (SQN) is designed to learn discriminative features independent of image quality. Finally, a large number of experiments verify the quality of the SuHiFiMask dataset and the superiority of the proposed CQIL.
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
In recent years, molecular graph representation learning (GRL) has drawn much more attention in molecular property prediction (MPP) problems. The existing graph methods have demonstrated that 3D geometric information is significant for better performance in MPP. However, accurate 3D structures are often costly and time-consuming to obtain, limiting the large-scale application of GRL. It is an intuitive solution to train with 3D to 2D knowledge distillation and predict with only 2D inputs. But some challenging problems remain open for 3D to 2D distillation. One is that the 3D view is quite distinct from the 2D view, and the other is that the gradient magnitudes of atoms in distillation are discrepant and unstable due to the variable molecular size. To address these challenging problems, we exclusively propose a distillation framework that contains global molecular distillation and local atom distillation. We also provide a theoretical insight to justify how to coordinate atom and molecular information, which tackles the drawback of variable molecular size for atom information distillation. Experimental results on two popular molecular datasets demonstrate that our proposed model achieves superior performance over other methods. Specifically, on the largest MPP dataset PCQM4Mv2 served as an "ImageNet Large Scale Visual Recognition Challenge" in the field of graph ML, the proposed method achieved a 6.9% improvement compared with the best works. And we obtained fourth place with the MAE of 0.0734 on the test-challenge set for OGB-LSC 2022 Graph Regression Task. We will release the code soon.
translated by 谷歌翻译
We present a high-fidelity 3D generative adversarial network (GAN) inversion framework that can synthesize photo-realistic novel views while preserving specific details of the input image. High-fidelity 3D GAN inversion is inherently challenging due to the geometry-texture trade-off in 3D inversion, where overfitting to a single view input image often damages the estimated geometry during the latent optimization. To solve this challenge, we propose a novel pipeline that builds on the pseudo-multi-view estimation with visibility analysis. We keep the original textures for the visible parts and utilize generative priors for the occluded parts. Extensive experiments show that our approach achieves advantageous reconstruction and novel view synthesis quality over state-of-the-art methods, even for images with out-of-distribution textures. The proposed pipeline also enables image attribute editing with the inverted latent code and 3D-aware texture modification. Our approach enables high-fidelity 3D rendering from a single image, which is promising for various applications of AI-generated 3D content.
translated by 谷歌翻译
In this paper, we study the problem of visual grounding by considering both phrase extraction and grounding (PEG). In contrast to the previous phrase-known-at-test setting, PEG requires a model to extract phrases from text and locate objects from images simultaneously, which is a more practical setting in real applications. As phrase extraction can be regarded as a $1$D text segmentation problem, we formulate PEG as a dual detection problem and propose a novel DQ-DETR model, which introduces dual queries to probe different features from image and text for object prediction and phrase mask prediction. Each pair of dual queries is designed to have shared positional parts but different content parts. Such a design effectively alleviates the difficulty of modality alignment between image and text (in contrast to a single query design) and empowers Transformer decoder to leverage phrase mask-guided attention to improve performance. To evaluate the performance of PEG, we also propose a new metric CMAP (cross-modal average precision), analogous to the AP metric in object detection. The new metric overcomes the ambiguity of Recall@1 in many-box-to-one-phrase cases in phrase grounding. As a result, our PEG pre-trained DQ-DETR establishes new state-of-the-art results on all visual grounding benchmarks with a ResNet-101 backbone. For example, it achieves $91.04\%$ and $83.51\%$ in terms of recall rate on RefCOCO testA and testB with a ResNet-101 backbone. Code will be availabl at \url{https://github.com/IDEA-Research/DQ-DETR}.
translated by 谷歌翻译
Conceptual knowledge is fundamental to human cognition and knowledge bases. However, existing knowledge probing works only focus on evaluating factual knowledge of pre-trained language models (PLMs) and ignore conceptual knowledge. Since conceptual knowledge often appears as implicit commonsense behind texts, designing probes for conceptual knowledge is hard. Inspired by knowledge representation schemata, we comprehensively evaluate conceptual knowledge of PLMs by designing three tasks to probe whether PLMs organize entities by conceptual similarities, learn conceptual properties, and conceptualize entities in contexts, respectively. For the tasks, we collect and annotate 24k data instances covering 393 concepts, which is COPEN, a COnceptual knowledge Probing bENchmark. Extensive experiments on different sizes and types of PLMs show that existing PLMs systematically lack conceptual knowledge and suffer from various spurious correlations. We believe this is a critical bottleneck for realizing human-like cognition in PLMs. COPEN and our codes are publicly released at https://github.com/THU-KEG/COPEN.
translated by 谷歌翻译
由于自我批判性和歧义,了解动态的手动运动和动态动作是一项基本而又具有挑战性的任务。为了解决遮挡和歧义,我们开发了一个基于变压器的框架来利用时间信息以进行稳健的估计。注意到手部姿势估计和动作识别之间的不同时间粒度和语义相关性,我们建立了一个网络层次结构,其中有两个级联变压器编码器,其中第一个利用了短期的时间cue进行手姿势估算,而后者则每次聚集物,后者每次聚集体 - 帧姿势和对象信息在更长的时间范围内识别动作。我们的方法在两个第一人称手动作基准(即FPHA和H2O)上取得了竞争成果。广泛的消融研究验证了我们的设计选择。我们将开放源代码和数据以促进未来的研究。
translated by 谷歌翻译
在这项工作中,我们研究了基于价值的深钢筋学习(DRL)中简单但普遍适用的奖励成型案例。我们表明,线性转换形式的奖励转移等同于更改函数近似中$ q $ function的初始化。基于这样的等价性,我们带来了关键的见解,即积极的奖励转移会导致保守的剥削,而负面的奖励转移会导致好奇心驱动的探索。因此,保守的剥削改善了离线RL价值估计,乐观的价值估计改善了在线RL的勘探。我们验证了对一系列RL任务的见解,并显示了其对基准的改进:(1)在离线RL中,保守的剥削可根据现成的算法提高性能; (2)在在线连续控制中,具有不同转移常数的多个值函数可用于应对探索 - 诠释困境,以提高样品效率; (3)在离散控制任务中,负奖励转移可以改善基于好奇心的探索方法。
translated by 谷歌翻译
我们为致密氢的方程式提供了基于深层生成模型的变化自由能方法。我们采用归一化流网络来对质子玻尔兹曼分布和费米子神经网络进行建模,以在给定的质子位置对电子波函数进行建模。通过共同优化两个神经网络,我们达到了与先前的电子蒙特卡洛计算相当的变异自由能。我们的结果表明,与先前的蒙特卡洛和从头算分子动力学数据相比,行星条件下的氢甚至更浓密,这远离经验化学模型的预测。获得可靠的密集氢状态方程,尤其是直接进入熵和自由能,为行星建模和高压物理学研究开辟了新的机会。
translated by 谷歌翻译
整合多个在线社交网络(OSN)对许多下游社交挖掘任务(例如用户偏好建模,建议和链接预测)具有重要意义。但是,不幸的是,伴随着越来越多的隐私问题,泄漏敏感用户信息。如何完全利用来自不同在线社交网络的数据,同时保存用户隐私仍然无法解决。为此,我们提出了一个跨网络的社交用户嵌入框架,即DP-Crosue,以一种隐私性的方式学习用户的全面表示。我们共同考虑具有不同隐私保证的部分调整社交网络的信息。特别是,对于每个异质社交网络,我们首先引入一个混合差异隐私概念,以捕获异构数据类型的隐私期望的变化。接下来,为了找到跨社交网络的用户链接,我们进行了无监督的基于用户嵌入的对齐方式,其中通过异质网络嵌入技术实现了用户嵌入。为了进一步增强用户嵌入,一种新颖的跨网络GCN嵌入模型旨在通过那些对齐用户跨网络传输知识。在三个现实世界数据集上进行的广泛实验表明,我们的方法对用户兴趣预测任务以及捍卫用户属性推理攻击的嵌入进行了重大改进。
translated by 谷歌翻译